1 

Optimizing Orthogonal Multiple Access based on 
Quantized Channel State Information 

Antonio G. Marques, Georgios B. Giannakis, and Javier Ramos 

Abstract 

The performance of systems where multiple users communicate over wireless fading links benefits from 
channel-adaptive allocation of the available resources. Different from most existing approaches that allocate re- 
sources based on perfect channel state information, this work optimizes channel scheduling along with per user rate 
and power loadings over orthogonal fading channels, when both terminals and scheduler rely on quantized channel 
state information. Channel-adaptive policies are designed to optimize an average transmit-performance criterion 
subject to average quality of service requirements. While the resultant optimal policy per fading realization shows 
that the individual rate and power loadings can be obtained separately for each user, the optimal scheduling is 
slightly more complicated. Specifically, per fading realization each channel is allocated either to a single (winner) 
user, or, to a small group of winner users whose percentage of shared resources is found by solving a linear 
program. A single scheduling scheme combining both alternatives becomes possible by smoothing the original 
disjoint scheme. The smooth scheduling is asymptotically optimal and incurs reduced computational complexity. 
Different alternatives to obtain the Lagrange multipliers required to implement the channel-adaptive policies are 
proposed, including stochastic iterations that are provably convergent and do not require knowledge of the channel 
distribution. The development of the optimal channel-adaptive allocation is complemented with discussions on the 
overhead required to implement the novel policies. 

I. Introduction 

The importance of channel- adaptive allocation of bandwidth, rate, and power resources in wireless 
multiuser access over fading links has been well documented from both information theoretic and practical 
communication perspectives [2]. Per fading realization, parameters including rate, power and percentages 
of time frames (or system subcarriers) are adjusted across users to optimize utility measures of performance 
quantified by bit error rate (BER), weighted sum-rate or power efficiency, under quality of service (QoS) 
constraints such as prescribed BER, delay, maximum power or minimum rate requirements. To carry out 
such constrained optimization tasks, most existing approaches assume that perfect CSI (P-CSI) is available 
wherever needed [17], [6], [9], [10], [19], [21]. However, it is well appreciated that errors in estimating 
the channel, feedback delay, and the asymmetry between forward and reverse links render acquisition of 
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deterministically perfect CSI at transmitters (P-CSIT) impossible in most wireless scenarios [8]. For cases 
where the scheduling takes place at the receiver, this has motivated scheduling and resource allocation 
schemes using perfect CSI at the receivers (P-CSIR) but only quantized CSI at the transmitters (Q-CSIT), 
that can be pragmatically obtained through finite -rate feedback from the receiver, see, e.g., [13], [18], and 
also [11] for a recent review on finite-rate feedback systems. 

This work goes one step further to pursue optimal scheduling and resource allocation for orthogonal 
multi-access transmissions over fading links when only Q-CSI is available at the scheduler (as, e.g., [5] for 
the non-orthogonal multiple input multiple output -MIMO- case), while transmitters have either perfect 
or quantized CSI. The unifying approach minimizes an average power cost (or in a dual formulation 
maximizes an average rate utility) subject to average QoS constraints on rate (respectively power) related 
constraints. This setup is particularly suited for systems where the receiver does not have accurate channel 
estimates (e.g., when differential (de-)modulation is employed or when the fading channel varies fast). 
It is also pertinent in distributed set-ups (sensor networks or cellular downlink communications), where 
the scheduler (fusion center, access point) is not the receiver and can only acquire Q-CSI sent by the 
terminals. The distinct features of this paper are: 

• Optimal resource allocation schemes that adapt rate, power, and user scheduling as a function of the 
instantaneous Q-CSI. 

• The optimal rate and power loadings per user terminal depend on the Q-CSI corresponding to its 
own fading realization, its relative contribution to the power cost (quantified through a user-dependent 
priority weight), and its rate requirement. 

• The optimal scheduling per channel boils down to one out of two modes: (i) a single user accessing 
the channel; or, (ii) a small set of users sharing the channel. The channel access coefficients under 
(ii) are obtained as the solution of a linear program. This bimodal policy emerges not only in systems 
that operate based on Q-CSI, but also in those that rely on P-CSI but operate over channels whose 
probability density function (pdf) contains deltas (e.g., discrete random channels or deterministic 
channels). 

• A novel asymptotically optimum scheduling scheme facilitating convergence and reducing complexity. 
This scheme combines the aforementioned cases (i) and (ii), and only incurs an e-loss relative to the 
optimal solution (with e representing a small positive number). 

• Stochastic allocation schemes that are provably convergent, without requiring knowledge of the 
channel distribution, while reducing the complexity of the overall design. 

• Operating conditions under which the system overhead can be reduced are identified. 

In addition, the approach here unifies notation at the receiving and transmitting ends, and clarifies the 
model when Q-CSI is available, yielding valuable insights for improved understanding of channel- adaptive 
resource allocation and finite-rate feedback. 

The rest of the paper is organized as follows. After modeling preliminaries in Section [Til the general 
problem is formulated in Section ITl-Al and the optimal solution is characterized in Section [Oil Algorithms 
to obtain the optimum Lagrange multipliers needed to implement the optimal policies are developed in 
Section [IV] Those algorithms rely on a novel smooth scheduling policy that reduces complexity and 
guarantees asymptotic optimality. Stochastic scheduling algorithms that do not require knowledge of 
the channel distribution are also developed. Section |V] provides examples and insights on the practical 
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implementation of the novel channel- adaptive schemes. Numerical tests corroborating the analytical claims 
are described in Section |VH and concluding remarks are offered in Section IVIlf*! 

II. Preliminaries and Problem Statement 

Consider a wireless network with M user terminals, indexed by m £ {1, . . . , M}, transmitting over 
K flat-fading orthogonal channels, indexed by k £ {1, . . . ,K}, to a common destination, e.g., a fusion 
center or an access point. Zero-mean additive white Gaussian noise (AWGN) with unit variance is assumed 
present at the receiver. With g m k denoting the fcth channel's instantaneous gain (magnitude square of the 
fading coefficient) between the mth user and the destination, the overall channel is described by the 
M x K gain matrix G for which [G] mj & := g m ,k- The range of values each g mk takes is divided into non- 
overlapping regions; and instead of g m ^ itself, destination and transmitters have available only the binary 
codeword indexing the region g m ^ falls into. With representing the corresponding region index, the 
M x K matrix J with entries [J] mi & := constitutes the Q-CSI of the overall system. Since g m k is 
random, j m ,fc is also a discrete random variable; and likewise J is random, taking matrix values from a 
set J with finite cardinality \J\. 

As in [21], [13], [9] or [19], users at the outset can be scheduled to access simultaneously but 
orthogonally (in time or frequency) any of the K channels. The channel scheduling policy is described 
by an M x K matrix W whose nonnegative entry [W] m fc corresponds to the percentage of the kth 
channel scheduled for the mth user. Clearly, it holds that J2Z=A W U,k e [0, 1] VJfc. The power and rate 
resources of all terminal-channel pairs are collected in M x K matrices P and R, respectively. Each 
of the corresponding entries [P] m ,fc and [R] m ,fc represent, respectively, the nominal power and rate the 
mth user terminal would be allocated if it were the only terminal scheduled to transmit over the fcth 
channel. Note that such entries are lower bounded by zero and upper bounded by the maximum nominal 
power and rate that the hardware of the system is able to implement. Since scheduling and allocation 
will be adapted based on Q-CSI, matrices W, P and R will depend on J and each can take at most \J\ 
different values. Under prescribed BER or capacity constraints, rate and power variables are coupled. This 
power-rate coupling will be represented by a function T (respectively T _1 for the rate-power coupling), 
which relates [P] mj fc to [R] m ,fc over the same Q-CSI region TZ([J} m ,k)- (Wherever needed, we will write 
Y^([j] m fe ) to exemplify this dependence.) 

A. Problem Formulation 

Given the Q-CSI matrix J and prescribed QoS requirements, the goal is to find W(J), P(J) and 
R(J) so that the overall average weighted performance is optimized. (Overall here refers to performance 
of all users and weighted refers to different user priorities effected through a preselected weight vector 
/x := [fii, . . . , Hm} T with nonnegative entries.) Depending on desirable objectives, the problem can be 
formulated either as constrained utility maximization of the average weighted sum-rate subject to average 

1 Notation: Boldface upper (lower) case letters are used for matrix (column vectors); (-) T denotes transpose; [-]k,i the (fc, Z)th entry of a 
matrix, and the (fc)th column (entry) of a matrix (vector); stands for entrywise (Hadamard) matrix product; ■ denotes differentiation; 
1 and are the all-one and all-zero matrices. Calligraphic letters are used for sets with \X\ denoting cardinality of the set X. For a random 
scalar (matrix) variable x (X), the univariate (multivariate) probability density function (pdf) is denoted by f x (x) (respectively /x(X)). 
Finally, A (V) denotes the "and" ("or") logic operator, a;* the optimal value of variable x; and, l/.j the indicator function (1/^} = 1 if x 
is true and zero otherwise). 



4 



power constraints; or, as a constrained minimization of the average weighted power subject to average 
rate constraints. The former fits the classical rate (capacity) maximization, while the latter is particularly 
relevant in energy-limited scenarios (e.g., sensor networks) where power savings is the main objective. 
Although this paper will use the power minimization formulation, the rate maximization problem can be 
tackled readily by dual substitutions; namely, after interchanging the roles of R and T^ji h \ by P and 
T n[J] m , k y respectively. 

Specifically, the weighted average transmit-power will be minimized subject to individual minimum 
average rate constraints collected in the vector f := [f 1; . . . ,f M ] T . Per Q-CSI realization J, the overall 
weighted transmit-power is given by X^f =1 [/f|m Z)fe=i[P(J)]m,fc[W(J)] mi /t; while the mth user's transmit- 
rate is X)fcLi[R(J)]m,fc[W(J)] m) fc. Using the probability mass function Pr{J}, these expressions can be 
used to obtain the average transmit-power and transmit-rate. For a given channel quantizer, i.e., with 1Z 
fixed, and the fading pdf assumed known, Pr{J} can be obtained as Pr{J} = J n ^ /g(G)^G, where 
TZ(J) represents the region of the G domain such that G G TZ(3) are quantized as J. Since T^qj^ k \ links 
R with P, it suffices to optimize only over one of them. Note also that the binomial [R(J)] m fc [W(J)] m fe 
is not jointly convex with respect to (w.r.t.) R(J) and W(J). For this reason, we will instead consider the 
auxiliary variable [R(J)] m)fe := [R(J)] m ,fc[W(J)] m)fe and seek allocation and scheduling matrices solving 
the following optimization problem: 



Appendix A shows that if T-^rji fc > is a convex function, then problem (OQ) is convex. Throughout this 
paper it will be assumed that: 

(asl) the power-rate function Tn([3\ mk ) i s increasing and strictly convex. 

This assumption holds generally true for orthogonal access but, for example, not when multiuser inter- 
ference is present. Note also that (asl) implies that the rate-power function T _1 is increasing and strictly 
concave. To justify the adoption of (asl), consider the following example of T. 

Example 1: For simplicity, the tractable case of outage capacity will be consider here, postponing the case 
of ergodic capacity to Section IV-DI Suppose that we want the outage probability of the mth user over the 
kth channel for a given Q-CSI J to be 5. Define the 5-outage channel gain for the (m, k) pair in TZ([J] mj k) 
as 9L,k([ J ]m,k) so that Pr{g mtk < fl£, jfc ([J] m ,fc) | g m ,k e ft([J] m ,fc)} = & Then usin g Shannon's capacity 
formula, the rate-power function can be written as Jx) = log 2 (l + ^^([J]^)). Solving the 

previous expression w.r.t. x, yields the power-rate function "£n([J\ mk ) {%) = (2 X — ^)/g S m fc([J]m,fc)> which 
is certainly increasing and strictly convex as required by (asl). 

Before moving to the next section where the solution of CQ) will be characterized, it is important to 
stress that since 1Z is involved in specifying Pr{J} and T^([j] tn fe ), the choice of 1Z affects the optimum 
allocation. Selecting the quantization regions to optimize (OQ) is thus of interest but goes beyond the scope 
of this paper. Near-optimal channel quantizers for time division multiple access (TDMA) and orthogonal 
frequency-division multiple access (OFDMA) can be found in [18] and [13], respectively. 
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III. Optimum Resource Allocation 

In this section, the optimum W, P and R matrices will be characterized as a function of J and the 
optimum multipliers of the constrained optimization problem in (OQ). 

Let X R denote the Mxl vector whose entries are the non-negative Lagrange multipliers associated with 
the mth average rate constraint; and X W (J) the K x 1 vector corresponding to the kih. channel-sharing 
constraint per Q-CSI matrix§ J. Let also ct R (J) and ct w (J) denote K x M matrices whose entries are, 
correspondingly, the non-negative Lagrange multipliers associated with the constraints [R(J)] m fe > and 
[W(J)] m fc > 0. The full Lagrangian of © can be written as 

C(X R \ W (J), cx R (J), <x w (J), R(J), W(J)) := 



VJeJ \m=l k=l 



M / / K \ \ K / M 

E E E[R( J )w pr{j} - [f]m + E E[ AW/ ( J )]* Ew j ; 

m=i \ VJeJ \k=i J J VJeJ k=i \m=l 



\m,k 



1 



M K 

- E E E ([«*(j)W[r(jqw + [« w/ (j)] m , fc [w(j)]„, fc ) . (2) 

VJeJm=i k=l 

Because © is convex, the Karush-Kuhn-Tucker (KKT) conditions yield the following necessary and 
sufficient conditions of optimality [1] (recall x denotes the derivative of x): 

M mTrcflj]^) ( r ^yi m ' fc 1 Pr { J ) - [A R *(J)] m Pr{J} - [« R *(J)W = (3) 



[W*(J)] m , fc J 



R*(J)W[a K *(J)W = (4) 



[R*(J)]m,fc I n r Tl r i ~~ / [R*(J)]m,fc \ [R*(J)]m,fc „ r Tl 

MJr([j] - ) 1 [w*(j)W ) {} " MmR([J, - ) l,[w-(j)] ra , fc J [w*(j )]m / r{J} 

-[a w *(J)W + [A w *(J)] fc = (5) 
[W*(J)] m , k [a w *(J)] m , k = 0. (6) 

Conditions (tU)-© can be used to characterize the optimal rate and channel allocation as follows. 
Proposition 1: The optimum rate allocation is given by: 

(i) [R*(J)] m , fc = 0, if either [W*(J)] m>fc = or [X R *] m /[fj] m < t^([j ]m , fc ) ( ffifflt'.* )'' otherwise ' 

(ii) the optimum rate allocation is 

[R*(J)U = f ^ [J]mfe) (^^) [W*(J)W (7) 
where fe ) denotes the inverse function o/"T^([j] m fc ). 

Proof: Consider first the claim in (i). The definition of [R*(J)] m ,fc implies that if [W*(J)] m fc = 0, 
then [R*(J)] mjfc = 0. On the other hand, if [X R *] m /[fj,] m < Tft([j] m fc )(-), then © can only be satisfied 
if [a R *(J)] m> k > 0. Using the slackness condition in ©, the latter implies [R*(J)] mjfc = 0. The proof 
of part (ii) is simpler and consists of solving © after excluding the two cases in (i); i.e. assuming that 

2 The dependence of the multipliers associated with instantaneous constraints on J will be explicitly written throughout. 
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[W*(J)] m > and [ct R * (3)] rrh k = 0. Given the relationship between R and R, the optimum transmit-rate 

for [W*(J)] m , fe ^ is 

[R*(J)W = ^aj ]m , fc) (^f) • (8) 

In fact, ([8]) is also valid if \W*(J)) m ,k = 0. This is because when [W*(J)] m ,jb = 0, any finite nominal rate 
yields [R*(J)] m fc = 0, which is the optimal solution. Equation ([8]) shows that the optimal rate loading 
depends on the ratio of [fj] m over [A^*]™, where the first represents the "priority" terminal m has to 
minimize the total power cost, and the latter represents the price corresponding its rate requirement. 
According to (asl), T is monotonically increasing function and so is T _1 in ([8]). This implies that users 
with high [f] m have high values of [A R *] m , thus higher rate and power loadings per region. Conversely, 
for users whose power consumption is critical the optimum solution sets high values of [fj] m , thus low 
rate and power loadings per region. Part (i) of the proposition also dictates that there may be regions 
for which the optimum rate and power loadings are zero. Intuitively, this will typically happen for the 
region(s) whose channel conditions are so poor that the power cost of activating the region may be too 
high. 

To find the optimum scheduling matrix W, define first the functional 

[CV(J)]m,fc := [/x] m T^([j] mfe )([R*(J)] mifc ) - [A fl *] m [R*(J)] m)fc (9) 

which represents the cost of scheduling channel k to user m when the Q-CSI is J. This cost of selecting 
[W(J)] m ,fe = 1 emerges also in the two first terms of C in (f2l). Based on ©, and with A denoting the "and" 
operator, we define the K x 1 vector c^(J, X R ) with entries [c^(J, X R )]k '■= min m {[CV (J, ^ R )]m,k}m=i, 
and the sets of "winner user(s)" M(3,k) := {m : [C W (J, X R )] m ,k = [c* w {3 , X R )] k A {[c* w {3 , X R )] k < 0)}. 
Given the Q-CSI realization J, M.(3,k) is the set of user(s) that incur the minimum cost if scheduled 
to access channel k while [c^(J, X R )] k is the cost corresponding to those users. Using these notational 
conventions, it can be shown that: 

Proposition 2: The optimum scheduling W*(J) satisfies the following: 

(i) If [W*(J)] m , fc > 0, then meM(J, k); 

(ii) If\M(3,k)\ > 0, then E me M(J,ifc)[ W *( J )kfc = 1; and 
(Hi) If \M(3, k)\ = 0, then [W*(J)] m , fe = Vm. 

Proof: Appendix B. 

In words, the optimal scheduler assigns the channel only to user(s) with minimum negative cost ©, 
which is in most cases (but not all) attained by a single user. This is a greedy policy because only one 
user with minimum cost is selected to transmit per Q-CSI realization, while others defer. Note that with 
P-CSIR, the optimum scheduling over orthogonal fading channels is also greedy, whether based on P-CSIT 
[9], [19] or Q-CSIT [13]. 

Case I (Single winner user): When the minimum cost is attained by only one user, W* in Proposition 2 
can be written using the indicator function, as 

[W*(J)] ro ,fc = l {m eM(J,k)} ■ (10) 

Since [Cw(J)] m ,fc is a function of different variables (namely, the quantization regions, the fading real- 
ization, the individual priority weight and the individual Lagrange multiplier), for most CSI realizations 
the costs corresponding to different users m are distinct, and the emerging winner is unique. 
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Case 2 (Multiple winners): The event of having different users attaining the minimum cost will be 
henceforth referred to as a "tie". The main difficulty with a tie is that Proposition 2-(ii) does not specify 
how the channel should be split among winner users (the underlying reason being that any arbitrary 
allocation minimizes C). On the other hand, only a subset (for most realizations one) of them is the actual 
solution to the original primal problem. To find the optimum schedule in this case, define first the matrix 
of single-winner scheduling as [W one (J)] m , jfc := [W*(J)] mjfc in (fTOb for all (J,k) so that \A4(J,k)\ = 
1, and [W one (J)] m ,fc := 0, otherwise. Define further the scheduling matrix with multiple winners as 
[W«e(J)] m| * = if \M{J,k)\ < 1 or if \M(3,k)\ > 1 but m $ M(J,k), and [W tie (J)W e [0,1], 
otherwise. And finally, let the set of multiple-winner scheduling matrices be Wu e '■= {W iie (J) | VJ}; the 
average single-winner transmit-rate vector [f one ] m := Evj (X]f=i[ R *(J)]m,4 w °ne(J)]m,fcJ Pr{J}; and 
ftie '■= f — fone- Using these definitions, the optimum schedule W t j e (J) for all (J, k) with |.M(J, k)\ > 1, 
can be found as the solution of the following linear program: 

T*([j] m , fc ) ([R*(J)W) [W«e(J)]m^) Pr{J} 

< s. to: Evj(Ef= 1 [R-*(J)]m 1 fc[W tie (J)] m , fe )Pr{J} = [f fe ] m , Vm (11) 

Ei=i[W« e (J)] m , fc = l, W(J,k): \M(3,k)\>l. 

Note that in the optimization process, only the matrices J for which a tie occurs are considered and for 
those only the non-zero entries of W tie (J) are optimized. 

The main idea behind (fTT|) is that among all schedules minimizing the Lagrangian when a tie occurs 
(second constraint), the optimal one for the primal problem is the one for which the average rate constraints 
are satisfied with equality. We stress that here R*(J) (thus P*(J)) are fixed and therefore only optimization 
over the channel-sharing coefficients for which a tie occurs (which in general is a small set) is carried 
out. To clarify this point, let us consider the following example. 

Example 2: Consider a system with K = 1 channel, M = 4 users and 10 regions per user. For such a 
system, the number of channel realizations is \J\ = 10 4 . Among those it is found that, e.g., ties occur 
for 3 different fading realizations, namely: when J = J x users 1 and 2 tie; when J = J 2 users 1, 3 
and 4 tie; and when J = J 3 users 2 and 4 tie. In this case, the optimization in (fTT)) has to be carried 
out over [W(Ji)]x,i, [W(Ji)] 2 ,i, [W(J 2 )]i,i, [W(J 3 )] 3 ,i, [W(J 2 )] 4 ,i, [W(J 3 )] 2> i, and [W(J 3 )] 4 ,i- Once 
W i * e (J) is found, the overall optimal channel assignment is [W(J)*] m>fe := [W* ne (J)] mifc for (J, k) with 
\M(J,k)\ < 1 and [W*(J)] m , fe := [W* e (J)] m , fc otherwise. 

It is worth noticing that for every scenario where multiple users access the channel orthogonally, the 
optimum scheduling needs to satisfy ([IB- However, neither [9], [19] (P-CSIR and P-CSIT) nor [13], [18] 
(P-CSIR and Q-CSIT) consider (|TT|) . This is because if the fading distributions are continuous and P-CSIR 
is available, the set of fading realizations G for which a tie occurs has Lebesgue measure zero. Therefore, 
any arbitrary channel scheduling among tied users is equally optimum. Indeed, the contribution of any 
specific G to the average performance when integrated over the channel pdf is zero. But when dealing 
with Q-CSI (or with deterministic fixed channels), neither the probability of a Q-CSI realization J nor 
the contribution to the average cost are negligible. And this precisely necessitates solving (fTTI) to obtain 
the optimum schedule. Intuitively, as the number of regions and channels increases sharing a channel 
becomes less likely, which in turn brings the solution closer to the continuous fading P-CSIR case and the 
effect of neglecting (fTTI) becomes less harmful. The opposite behavior arises in systems that have P-CSIR 
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but further operate over deterministic (fixed) channels. In those systems ties will represent the prevailing 
channel allocation (e.g., for a deterministic TDMA system we have K = 1 and \J~\ = 1; since all the 
users have to access the channel to satisfy their rate constraints, the entries of X R * will self-adjust so 
that a tie among all the users occurs). Only in systems operating over deterministic channels for which 
the number of channels is much higher than the number of users (e.g., an OFDMA system with many 
subcarriers), the single-winner case will constitute the predominant scheduling. 

In the context of smooth optimization, a single scheduling scheme that can be implemented both for 
cases 1 and 2, is asymptotically optimal, incurs reduced computational burden and facilitates computation 
of the optimal Lagrange multipliers is developed in the next section. 

IV. Optimal Lagrange Multipliers 

To implement the optimum scheduling and rate allocation policies presented in the previous section, 
the optimum multiplier vector X R * needs to be known. Since the rate constraints in <0Q) are always active, 
the KKT conditions imply that when X R = X R * those constraints are satisfied with equality. Since X R * 
cannot be obtained analytically from this condition, numerical search is required. This is possible using 
dual methods. First, let us write§ a simplified version of the Lagrangian 

c(x R , R(j), w(j)) := Y, ( EM-E t *([jw) ( [wfj)u ) [w(J)kfe ) Pr{J} 

K \ \ M 

- E ( ^ E ( E^( J )W + E[ AR u f ]- w 

m=l \ VJ£j \k=l J J m=l 

where only the contribution of the average rate constraints is considered [cf. ©]. Because all the 
instantaneous constraints (i.e., channel-sharing and non-negativity constraints) were already satisfied when 
obtaining the solution of the previous section, the focus here is to find X R so that the average rate 
constraints are satisfied. Let J 7 (J) denote the feasible set of the rate and channel assignment matrices, 
namely .F(J) := {(R(J), W(J)) | R(J) > A W(J) > A £™ =1 [W(J)] m , fc < !}• The dual function 
is then defined as 

D(X R ) : = inf £(A*R(J),W(J)) 

(R(j),w(j))e^(J) 

= C(X R , R*(J, X R ) W*(J, X R ), W*(J, X R )) (13) 
which is concave w.r.t. X R . Based on (fl"3l) . the dual problem of ([T]) is 

m&xD(X R ). (14) 

A«>0 

Since the problem in ([T]) is convex and strictly feasible, the duality gap between the primal and dual 
problems is zero. Thus, the value of X R optimizing (fl4"l) can be used to find the optimum primal solution. 
A standard approach to obtain A^* is to implement a subgradient iteration (a gradient iteration is impossible 
here because D(X R ) is non-differentiable w.r.t. [A K ] m ). Let dD(X R ) denote a subgradient vector of (fT3l 
whose mth entry is [dD(X R )] m := [r] m -£ VJ Ev fc [ R *( J ; * R )]m,k [W*(J, X R )] m , k Pr{J}; let also i denote 

3 Throughout this section, dependence on X R will be made explicit wherever it contributes to clarity. 
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an iteration index, and f3^ a decreasing small stepsize such that Y^iLi = 00 an( ^ (@^) < 00 ■ 

With these choices, the iterations 

\ R(i) = X Rii - 1) +^dD(X R{i - 1) ) (15) 



converge to X R * as i — > 00 (cf. [1, Sec. 6.3.1]). A major challenge in obtaining A R * using (fl~5T) is that 
[9-D(A K )] m is discontinuous because W*(J, X R ) is not continuous for every X R that gives rise to a tie. 
This problem is critical, because in most cases X R * is one of the points where [dD(X R )] m is discontinuous. 
Note that discontinuity of the primal solution at X R * implies that obtaining a solution arbitrarily close to 
the optimal in the dual domain, does not guarantee obtaining a solution arbitrarily close to the optimal in 
the primal domain. Specifically, after running a sufficiently high but finite number of iterations /, we can 
guarantee that X R is a very good approximation for X R *, but we cannot guarantee that W*(J, X R ^) 
is a good approximation of W*(J, X R *). In fact, it can be shown that such schedulings are significantly 
different for a subset of channel realizations J, and that the scheduling W*(J, X R ^) is not a feasible 
solution of (OQ) since it violates the average rate constraints. 

Our approach to solve this problem is to reinstate Lipschitz continuity by smoothing the scheduling 
function. Smoothing ensures continuity or differentiability and has been successfully applied to different 
optimization problems; see e.g., [22] and [14]. Since scheduling discontinuities appear in the transition 
from a tie to a single-winner (check (flOl ), (fTTI) and the left and right upper plots of Figure [D, the idea is 
to relax the condition for scheduling in the A;th channel only when m E Ai(J,k). This is possible through 
the set M s (J,k) := {m : {[C W {J, X R )] rn , k -[c* w (J, X R )] k < e) A ({c* w (J, X R )] k < 0)}, where e is 
a small positive number. Based on Ai s (J,k), consider the following suboptimal but smooth scheduling 
matrix 

[C w (J,\ R )] mik -[c* w (J,\ R )] k ^ 2 



[W S (J,A )] m>k := l {me M'(J,k)} 7 n2 - < 16 ) 



Clearly, [W S (J, X R )] mjk schedules channel k not only to users m whose cost is minimum but also to those 
whose cost is e-close to the minimum. This can be readily appreciated in the left lower and right lower 
plots of the example illustrated in Figure [TJ According to the upper left plot, when [A^ € (3.45,3.5) 
the optimum allocation assigns the channel to user 1, meaning that its cost is the lowest in that interval. 
However, according to the lower right plot, when [X R ] 2 G (3.45, 3.5) the smooth allocation assigns a 
portion of the channel also to user 2. This is because although the cost of user 1 is still smaller, within 
that interval the difference of costs between the two users is less than e. Something similar happens when 
[X R }2 6 (3.5, 3.55), but in this case user 2 is the one with the smallest cost. 

The scheduling in (fl~6l) exhibits other relevant properties that are summarized in the next Proposition. 
Proposition 3: The smooth scheduler W S (J, X R ) satisfies the following: 

(i) If [W s (3, X R )] m>k > 0, then m e M S (J, k) and [C W (J, X R )} m , k < [c* w (J,X R )] k + e; 

(ii) If\M s (3,k)\ > 0, £™ 6 M*(J,fc)[W s (J,A*)] m)fe = l; 
(Hi) If \M(J, k)\ = 0, then [W s (3, A R )] m , ifc = Vm; and 
(iv) [W S (J, X R )] mik is a continuous function of X R . 

Proof: The construction of the scheduling matrix (fl~6l) can be readily used to verify the claims (i)-(iv). 
Properties (i)-(iii) of W s are similar to those of W* stated in Proposition [2l while (iv) ensures continuity 
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Fig. 1. Optimal (top) and smooth (bottom) channel allocation for the fcth channel as [A fl ]2 varies. The simulated set-up is: M = 2, 
e = 0.01, [\ R ]i = A is kept constant, and [C W {J , \ H )]i,k = [C w {J,\ R )h,k when [A fl ]i = A and [X R ] 2 = 3.5. 



(check lower plots in Figure [T]). Besides being continuous, the smooth scheduling also lowers complexity 
relative to its discontinuous counterpart. In fact, when a tie occurs, finding W*(J) requires solving a linear 
program that involves channel realizations other than J (recall Example 0, while finding W S (J) requires 
only the computation of the closed form in (fT6l) without having to consider any channel realization other 
than J. 

Based on Proposition [3j the following result can be established. 
Lemma 1: If D S (X R ) := C(X R , R*(J, X R ) © W S (J,A K ), W s (J,A fi )) and [d s D(X R )] m := [f] m - 
Svj Svfc ^ R )]m,k [W S (J, X R )] m! k Pr{J} denote smooth versions of the dual function and its 

subgradient, then: 

(i) For all X R , it holds that D(X R ) < D S (X R ) < D(X R ) + e', where e' := Ke; and 

(ii) [d s D(X R )] m is a Lipschitz continuous and decreasing function of X R . 
Proof: Appendix C. 

Lemma \T\ guarantees that dD s (X R ) is a Lipschitz continuous e'- subgradient of D(X R ) [1, pp. 625] and 
will play a critical role in the convergence results presented later in Propositions 0] and [5l At this point, 
we are ready to prove the following result. 

Proposition 4: If j3 is a small constant stepsize, there exist X R ^ so that: 

(i) the iteration 

X R(i) = X R{1 - 1] +pd s D(X R(l ~ 1] ) (17) 

converges, i.e., X R ^ — > X Rs ; and 

(ii) at the limit point it holds that: D(X R *) < D s (X Rs ) < D(X R *) + e'. 



1 1 



Proof: To prove part ( i), it suffices to show that (fTTT ) is a nonlinear contraction mapping, which basically 
requires: (a) existence of X Rs such that d s D(X R ) = (this is trivial because the entries of the smooth 
subgradient are continuous); and (b) the Jacobian of d s D(X R ) to be negative definite with bounded 
eigenvalues. These two properties of the Jacobian are proved in Appendix D. The proof of part (ii) is 
simpler and relies on Lemma EKi) and on the fact that there is zero duality gap; see Appendix E for 
details. 

Proposition |4] is of paramount importance. First, it guarantees that if R*(J, X R ) and W S (J, X R ) are 
implemented with X R = X Rs , then the average rate constraints are satisfied with equality (recall that 
d s D(X R ) = only if this is the case). Second, it provides a systematic algorithm to compute X Rs . Third 
and foremost, it guarantees that the overall weighted average power penalty paid for implementing the 
smooth policy R*(J, X Rs ) and W S (J, X Rs ) instead of the optimum policy R*(J, X R *) and W*(J, X R *) 
is less thar0 e'. The latter assertion is true because according to the definitions of D(X R ) in (TT3T) and 
D S (X R ) in Lemma [H the values of the dual functions coincide with those of the Lagrangian in © when 
the optimum and the smooth policies are implemented, respectively. Since when D(X R *) and D s (X Rs ) are 
evaluated via © all the constraints are satisfied with equality, the only remaining term in the Lagrangians 
is the overall weighted average transmitted power. Therefore, the bounds on the dual values in Proposition 
IU-(ii), directly translate to bounds on the overall weighted average power consumption. 

An algorithm based on Proposition 0] to find X Rs is described next: 

Algorithm 1: Calculation of the Lagrange multipliers 

(51.0) Initialization: set vectors <5i, 62 to small positive values; X R ^ = Si, and the iteration index i — 1. 

(51.1) Resource allocation update: per Q-CSI realization J, use A^ -1 ^ to obtain R(J)^ and P(J)^ 
based on © and T n[ j ]m k) ; and W S (J)^ using ([16]). 

(51.2) Dual update: use (Sl.l) to find d s D(X R(i ~ 1) ). Stop if \d s D(X R{t ~ 1] )\ < S 2 ; update X R(i) as in 
(fTTT ), and set i — i + 1; otherwise, go to (Sl.l). 

Due to the average formulation in (OQ), Algorithm 1 entails computing the average rate and power per 
user which require the knowledge of the joint channel distribution. Specifically, Pr{J} needs to be known 
VJ. It must be run during an initialization (off-line) phase before the communication starts and it only 
needs to be re-run if either the channel statistics or the users' QoS requirements change. Once X R is 
known, the (e'-) optimum allocation per J is found online using R*(J, X Rs ), T-R/rji fc ), and W S (J, X Rs ). 
Since expressions for those are available in closed form [cf. © and (fl6l)l. the computational burden 
associated to the online phase is negligible. 

A. Stochastic Estimation of the Lagrange Multipliers 

As mentioned before, X Rs is obtained using Algorithm 1 off-line, and requires knowledge of the 
channel distribution. However, this computation cannot be always efficiently carried out or may even 
be infeasible. This is the case when: (a) the number of users, channel statistics, and QoS requirements 
change so frequently that X R * has to be continuously re-computed; (b) in limited-complexity systems 
that cannot afford the off-line burden; or (c) when the joint channel distribution is unknown. For those 

4 In practice, the gap w.r.t. D(X R *) is much smaller than e' . This is because W S (J, X R ) / W*(J, \ R ) only if \M"(J,k)\ > 1, which 
is a rare event; hence, on average, the bound in Lemma is very loose; see also Appendix C. 
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situations, stochastic approximation algorithms [7] arise as an alternative solution to estimate A [20]. 
Let n index the current block (whose duration corresponds to the channel coherence interval T c h), and let 
J[n] denote the fading state during block n. Our proposal amounts to replace the ensemble average 
subgradient [d s D(X R )] m = [f] m - £vj £vJ R ( J > A B ))] m , fe [W s (J, X R ))] m>k Pr{J} with its stochastic 
version [d s D(X R , n)] m := [f] m - £ V JR(JH A R ))] m , fc [W s (J[n], * R ))W- Using this definition!, the 
original iterations over A^ in (TT71) can be replaced by their estimates 

X R [n + 1] = X R [n] + (3d s D(X R [n},n) (18) 

where [3 is again a constant stepsize. Capitalizing on the Lipschitz continuity of d s D(X R ,n), it can be 
shown that for sufficiently small (3: (i) the trajectories of the iterations in (TT71) and (fT8l) are locked; and 
(ii) the stochastic iterates in (fT8l converge to a neighborhood of X Rs . Specifically, we have: 
Proposition 5: With initial conditions similar to f fTTI) and and given T > 0, there exist br > and 
Pt > so that almost surely 

max \\X Rs{n) - X Rs [n]\\ < c T ((3)b T (19) 

l<n<T/(3 

where < (5 < (3t and or{f3) — > as (3 — > 0. 

Proof: The result in (fT9l) can be shown by adopting the averaging approach in [15, Chapter 9]. Following 
the averaging method for approximating the difference equation trajectory, the updates in (fT8l and those 
in (flTI) can be seen as a pair of primary and averaged systems. Under general conditions, it is possible to 
show the trajectory locking of these two systems via [15, Theorem 9.1]. The full proof of the proposition 
is omitted due to space limitations, but the main idea hinges on the Lipschitz continuity of d s D(X R , n) to 
prove that the most challenging conditions required in [15, Theorem 9.1] hold. Interestingly, as n — > oo 
a similar approach can be used to show convergence in probability of (fT8l) to (flTT) . [15, Theorem 9.5]. 

Proposition [5] not only states that the trajectories of the online iterations remain locked to those of the 
original ensemble (off-line) iterations, but also that the gap between those shrinks as the stepsize (that 
is at our disposal) vanishes. The result holds for a constant (non-zero) ft, which allows the iterations in 
(fT8l to cope with channel non-stationarities and track changes in the system set-up (e.g., users entering or 
leaving the system). This type of convergence is different from that exhibited by other relevant stochastic 
resource allocation schemes [16], [20]. 

From an implementation perspective, it must be emphasized that iterations in (TT8l can be implemented 
online without knowing the channel distribution. This eliminates the need for implementing Algorithm 1 
during an off-line phase, and greatly reduces the overall complexity. However, they moderately increase 
the complexity during the online (communication) phase. To clarify these assertions, a description of 
the system operation when the channel- adaptive schemes are implemented based on X Rs (non-stochastic 
implementation) and when those schemes are implemented based on X R [n] (stochastic implementation) 
is presented next. 

• Systems implementing non- stochastic adaptive schemes operate in two phases. During an off-line 
(initialization) phase Algorithm 1 is executed and the returned value of X Rs is distributed to the 

5 Stochastic implementations of d s D(\ R ,n) different from the one proposed here are also possible. For example, convergence to the 
optimum value using arguments similar to those in Proposition [5] can be also proved for stochastic versions based on finite time window 
averaging or sample averaging. 
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transceivers. During the online phase, the value of J is updated every coherence interval, and the 
powers, rates and scheduling are adapted with X R = X Rs and J = J[n]. 
• Systems implementing stochastic adaptive schemes operate purely online. During the online phase 
two tasks are implemented per coherence interval. First, the powers, rates and scheduling are adapted 
with X R = X R [n] and J = J[n]. Second, the multipliers estimates for the next block X R [n + 1] are 
updated according to (fT8l) . 

The stochastic schemes also entails change in the place where computations are implemented. For the 
non-stochastic case, Algorithm 1 will likely be implemented at the access point and the value of X Rs 
will be transmitted once wherever needed. However, for the stochastic case, A^fn] is updated every 
coherence interval, and therefore instantaneous broadcasting of the analog value of X Rs [n] is not feasible. 
This implies that during the system operation, iterations in (fT8l) will have to be implemented at different 
locations. This way, a transmitter that wishes to implement its optimal rate loading in ([8]) will need to 
know its own entry of A^n], while an access point that wants to find the optimum scheduling in (fT6l) 
will need to know the value of the entire A^n]. As Proposition [5] states, to ensure consistency all the 
transceivers will have to use identical initialization. 

V. Overhead Issues 

Previous sections focused on the formulation of the channel-adaptive schemes as well as on developing 
systematic ways to obtain the variables involved in these optimal schemes. The overhead involved in such 
schemes is the main goal of this section which relates to practical implementation issues. Specifically, 
we try to answer questions as: What is the number of different optimum resource allocations? What is 
the amount of feedback required to implement the developed schemes? How do the functions involved 
in the optimal schemes look for practical modulations? This overview not only will allow for more 
efficient implementations of the novel adaptive schemes but also will provide insight to better understand 
channel-adaptive resource allocation and finite-rate feedback. 

A. Exploiting the structure of the optimum solution 

Two properties of the optimal resource allocation are useful to reduce the computational overhead. 
Specifically, we observe that: 

PI) Given X R * , the optimum rate matrix R* in ([8]) satisfies the following: (i) for a given user m it does 
not depend on the other users vnl ^ m; and (ii) the optimum rate allocation for channel k can be 
carried out separately from the allocation of the remaining k' ^ k channels. Since the power-rate 
function depends on the specific region lZ([J] m> k), the previous properties imply that the optimal 
rate (and thus power) allocation for user m on channel k can be obtained separately from the rate 
allocation in the remaining regions 7?.([J]^ fc ) ^ 1Z([J] mi k)- In other words, the rate allocation can 
be written as [R*(J)] m>fc = [R*([J] m>k )] m>k . 

P2) Given X R *, the previous observations can be used to obtain the cost indicator function as [C^(J)] mj fe = 
[Ciy([J]m,fc)]m,fc V(J, m, k). Since the user scheduling for channel k, that is [W s (J)] mj fc Vm, is found 
based on [Ci4/([J] m ,fc)] m ,fc Vm, information about channels k' ^ k is not needed [c.f. (fT6l)l. Therefore, 
the user-scheduling allocation can be written as [W s (J)] m fe = [W s ([J] fc )] mfc . 
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Properties PI) and P2) point out that for a given channel realization J, vector X R * encapsulates most of 
the information the (m, k) user-channel pair needs from: channel realizations different than J, channels 
different than k, and users different than m. 

To appreciate the implications of PI) and P2), in the following we will consider that each individual 
channel domain is divided into L quantization regions. Without loss of optimality the quantization regions 
can be represented by a set of thresholds {q m> k,i\l=i t 13 ]- Hence, if g m>k G [q m ,k,i, <W,z+i), then [J] m>fc = I; 
see e.g. [13]. (Note that since g m>k G R+, q m>kyl = and q m ,k,L+i = oo V(m, k).) 

An immediate implication of PI ) and P2) is that the average over J can be decomposed into sub-averages 
across channels. Specifically, with J k denoting the set of possible values [J]* takes, each individual average 
rate can be rewritten as 

fe[ R *([ J ])]^[ w *( J )]^) mj} = E ( E pa*a)U[wa)]m,*pr{[j]* = j> 

\/jej \k=i J k=i VvjG^ 

While the left hand side requires K\J\ = KL KM summations, the right hand side only requires K\J k \ = 
KL M . 

Another possibility to reduce complexity is to cluster different channel realizations that give rise to the 
same optimal resource allocation. For example, consider a channel realization J x for which user m' is found 
to be the winner for the fcth channel, and a different channel realization J 2 so that [Ji] m ' * = [^Im'.fc an d 
[Cv^(J2)]m,fc > [Cvi/(Ji)]m,fc Vm ^ m! . It is clear that user m' will be again the winner and the resource 
allocation over the kth channel for both J 1 and J 2 will be the same. This can be formalized as follows. 
Proposition 6: Assume that [R*([J] m fc + l)} m ,k > [R*([J]m,fc)]m,fc (i- e -> the better the channel the higher 
the allocated rate), and define J™' := {j G J k : [W*(j)] m fe = 1 A [j]. m = I}. It then holds that: 

(0 If} e J™' 1 , then {}' G J k : [j'] m , = \j] m , Vm' + m A [j'] m > [j] m } C J^ 1 



(H) If} e J k m \ then {j' G J k 
(Hi) If} i J k m \ then {j' G Jk 



k 

LiV' < [j]m' Vm'/mA = Li]m} C J™> 1 
LiV' > \j] m > Vm'/fflA [j'] m = [j] m } ^ J fe m '' 



Proof: Appendix F. Under the reasonable assumption that [R*([J] mj fc + l)] m ,k > [R.*([J]m,fe)]m,fe (which 
is true for the examples of T in this paper), the properties in Proposition fallow one to group the channel 
realizations J in clusters, which yield the same optimum resource allocation. Clustering can be exploited 
to reduce the calculations required to determine the optimum resource allocation (Algorithm 1) as well 
as to reduce the finite-rate feedback overhead as discussed next. 



B. Finite-Rate Feedback 

As it was mentioned in Section HI for non-reciprocal channels the Q-CSI can be naturally obtained 
at the transmitters through finite-rate feedback from the receiver. Since J has finite cardinality, clearly 
a finite number of bits B := [log 2 (j J7"|)] suffices to index the current realization J. To ensure that the 
Q-CSIT coincides with the Q-CSIR we will assume that: 

(as2) the feedback channel is error-free, incurs negligible delay, and the channels remain invariant over 
at least two consecutive symbols. 

Note that this is a pragmatic assumption for Q-CSI since each channel can vary from one symbol to the 
next so long as the quantization region it falls into remains invariant. In addition, error-free feedback is 
typically guaranteed with sufficiently strong error control codes especially since rate in the reverse link 
is low. 
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Although in principle the resource allocation varies as a function of J, it is important to note that 
from an operational perspective the main objective is not feeding back the current J to the transmitters, 
but identifying the optimal resource allocation the transmitters have to implement. These tasks are not 
equivalent because as it was stated in Proposition [6l different channel realizations can be mapped to the 
same resource allocation. In other words, although a receiver actually realizes that the quantized value 
of the channel has changed from Ji to J 2 , if the resource allocation is the same in both cases, for the 
transmitters there is no difference between Ji and J 2 and they do not need feedback from the receiver 
notifying them that the channel has changed. This is a meaningful difference because, as it was hinted by 
PI) and P2), the cardinality of the optimal resource allocation is much smaller than the cardinality of the 
Q-CSI matrix. Therefore, in order to find the minimum amount of feedback the transmitters require, the 
cardinality of the optimum resource allocation, [R*(J)] mifc = [R*([J] m ,fc)] m ,fc and [W s (J)] fe = [W s ([J] fc )] fc , 
has to be carefully examined. 

Regarding the rate (power) allocation, it easy to see that |{[R*([J]m,fc)]m,fc}vj| = L. The cardinality 
of the set of different user schedulings depends on whether the winner is unique or not. The cardi- 
nality when the winner is unique is also easy to decipher: either |{[W s ([J]fc)]fc}vj| — M if there is 
always one user active, or, |{[W s ([J] fc )] fc } V j| = M + 1 if the additional case of "no-user-transmitting" 
is considered (i.e., the possibility that \Ai(J,k)\ = 0). For those channel realizations for which the 
winner is non-unique the analysis is more complicated. Consider again the system described in Example 
with K = 1 and M = 4, and suppose now that we have a channel realization J' = [J']\ so 
that user 1 achieves the minimum cost [Cv^(J')]i,i. but the cost of user 2 is very close to it, e.g., 
[CV(J')]2,i = [Cw(J')]i,i + e/2. Substituting those costs into (|T6l) . we have [W s (J')]i,i = 4/5 and 
[W s (J')]i i i = 1/5. This implies that the set {W s (J)} V j not only contains the single-user allocations 
{[1, 0, 0, 0] T , [0, 1, 0, 0] T , [0, 0, 1, 0] T , [0, 0, 0, 1] T , [0, 0, 0, Of}, but also the additional element [4/5, 1/5, 0, 
0] T . From a practical perspective, it is worth noticing that the user-sharing policy can be implemented in 
two different ways. Recalling that T ch denotes the coherence interval a first option is for user 1 to transmit 
during 7^(4/5) seconds and user 2 during the remaining T ch /5 seconds. Alternatively, each time that 
realization J occurs, user 1 can transmit with probability 4/5 and user 2 transmits in the remaining cases. 
Note that if scheduling is implemented following the first option, the number of different user schedulings 
per channel is indeed higher than M + 1. However, if the system implements the second option the 
cardinality of the different user-scheduling policies is |{[W s ([J]fc)]fe}vj| = M + 1, maintaining its original 
value. Since the second implementation entails lower feedback overhead, in the ensuing analysis it will 
be assumed that the system implements channel sharing using a probabilistic access scheme. 

Based on the previous observations, for the receiver to notify the transmitters of the optimum resource 
allocation, the following information has to be fed back per channel: the index of the winner user index (M 
possibilities) together with the index of the rate (and power) allocation for that user (L possibilities), plus an 
additional codeword corresponding to the event of no-user transmitting. This implies that the total feedback 
required per channel is |~log 2 (ML + 1)] bits. Since the resource allocation is not coupled across channels, 
the total amount of feedback required is B' = \K log 2 (ML + 1)] bits. This number is significantly smaller 
than that required to identify the specific channel realization, flog 2 (| J7'|)~| = \K log 2 (L M )] bits. In other 
words, the receiver does not have to index the quantized version of the channel, but the quantized version 
of the channel state information. 
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Finally, it is worth remarking that the assessment of overhead so far does not exploit the potential 
correlation of the fading channel across users (i.e., [J T ] m and [J T ] m '), channels (i.e., [J]* and [J]*/), or 
time (i.e., J[n] and J[n']). If those were considered, the total amount of feedback could be further reduced. 
Although exploiting the channel correlation to reduce the feedback overhead is certainly a topic of interest, 
it goes beyond the scope of this work. 



C. A simple channel model 

In this section, several assumptions that allow one to obtain explicit expressions for the probability 
mass function of the channel are made. Suppose first that: 

(as3) the fading processes for different users are uncorrelated, which implies that J has uncorrelated 
columns; and 

(as4) user channels are allowed to be correlated, and each is complex Gaussian distributed; that is, ifg m k 
denotes the average channel gain, fg mk {g m ,k) = (V<7 m k) ex P( — 9m,k/g~ m k) zs tne exponential pdf of g m .k- 
Note that (as3) is common when the users are scattered along space, while (as4) corresponds to a Rayleigh 
flat fading model. 

Using (as3), (as4), and the fact that quantization regions for individual channel gains are represented by 
the set of thresholds {q m ,k,i}f=i ■> me probabilities Pr{[J] m fc = j m ^} and Pr{[J] fc = j} can be respectively 
found as 

Pr{[J]m,fc = jm,k} = e W -e W (20) 

f _Vyj]m g m.fc,[j] m +l \ 

Pr{[J] fe =j} = J] ( e ° m ' k ~ e ~ 9m ' k )• (21) 

m=l v ' 

D. Examples of power-rate functions 

Another issue affecting implementation aspects of the developed schemes concerns the scenarios for 
which the power-rate function T(x) satisfies (asl). Using Shannon's capacity formula, expressions for 
T(x) and T _1 (a;) that for every region guarantee a specific outage capacity were given in Example [Q If 
instead of that definition, one considers the ergodic capacity of user m over the A;th channel for its [J] mj fcth 
region, it follows that r mjfc = J 9m ken{[J]m k) log 2 (l + Pm,kgm,k)f 9mik (gm,k)dg m ,k- Using (as4), T _1 (a;) and 
implicitly T(x) can be written as: 

r1m,k,[3] m fe + l Q—9m,k/Sm,k 

T n\iJ] m . k )( x ) = / log 2 (l + s0 m , fc ) pi[J1 -r dg mtk (22) 

T niJU, k ) = {x^y.x- T-; [J]m k) (y) = 0\. (23) 
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If convenient, the exponential integral function E 1 (x) := exp(— t)/tdt can be used to re-write ((22)) in 
closed form as: 



T n\[j] m , k ) ( x ) 



log(l + xq mA n ]mk )e w ' +EA _ m ' fc ' 1J|m ' fc ) e"»« 



- log(l + zgm 1 k > [j] m , fc +i)e 9m - fc - #i — 



'l + xq mA[J]mk+1 \ i 



x log 2 (e) 



■ g m,fc,[J] m|fe 9m.fc,[J] mjfc + l 

g 9m, k — g #m,fc 



-i 



(24) 



Since T _1 (x) is monotonically increasing [cf. (1221)1. it readily follows that T(x) is also monotonically 
increasing. The strict convexity of T(x) is shown in Appendix G. 

Besides the power-rate relationship given by the capacity formula, there are situations where trans- 
missions are implemented using pre-specified coding and modulation schemes. Since in those cases a 
maximum BER is typically prescribed, it is possible to use the BER requirement in order to relate power 
and rate over a given region. To be more specific, suppose that: 

(as5) the symbols are drawn from coded modulations such that the BER function can be adequately 
approximated by e{g m ^p m ^r m)k ) ~ Kiexp {-g m ,kPm,k^2l '{2 r ™< k - 1)), 

where ki and n 2 are constants that depend on the specific modulation and code implemented (e.g., for 
the uncoded case we typically have k 2 = 1). In addition to being accurate for many practical modulations 
[2] and [3], (as5) yields tractable mathematical expressions. 

If QoS requirements impose a maximum instantaneous BER e max per user, (as5) can be used to obtain 
T(x) in explicit form as 

T^M ^ 2 ': 1 ''^ 1 ^ . (25) 

Note that if a powerful coding scheme giving rise to a coding gain of k 2 = ln(rei/e max ) is implemented, 
then (|25T) reduces to the one introduced in Example \T\ that was derived from the formula of the outage 
capacity for 5 = 0. The adoption of maximum instantaneous BER as a QoS requirement also implies that 
the first region will always represent an outage region with zero power and rate since the power cost for 
transmitting even minimal rate is infinite. 

If QoS requirements dictate that for every region, channel and user a maximum average BER e can be 
tolerated, then T(x) is an implicit function 

"9m,k,[J] m k + l g-9m,fc/5m,t 

9m,k 



T ^([J] m , fc ) = \ x ->V- e= t(9m,k, V, x)— — ^JTTT~ \ d 9m,k 

•J 1m.k,[3] m>k 



«2gm,fc,[J] TO[fc -l | yg mik \ K 29 m , fc ,[j] m fc / ^ y9m fc - 



g g 9mi V / — g 9m ' k 

x ~^ y ■ z~ ~~ 7 " 2g m,fc,[j] m -r K 2" m ,k,[}] m \~ z ; f • 

Kl ' e v m , k _ e w W 1 , V9 m ,k 



-i 



It can be shown that T(x) can be written as an explicit function of the optimum rate, [/x] m and [A ] m as 

t ^ ([ji - ) {x) = (2v) 
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Convexity of (1251) and (|26l) is established in Appendix G. Clearly, alternative T(x) functions satisfying 
(asl) can be derived for modulations whose BER does not satisfy (as5). For example, any e(g mtk , Pm,k, r m ,k) 
that is increasing w.r.t. r mik and decreasing w.r.t. p m>k while being jointly convex w.r.t. p m>k and r m ^ k will 
give rise to a strictly convex T(x). 

From an implementation perspective, not having Tn([j] m k ) hi closed form (thus not having "^^.(rji k \ 
in closed form) does not necessarily incur a major penalty in terms of computational complexity. Since 
those expressions do not change with time, the computational burden can be reduced by characterizing 
those over the domain of interest only once, and using those characterizations for each iteration. 

VI. Numerical Examples 

To test the algorithms developed, we simulated uncorrected complex Gaussian fading channels per user 
adhering to (as2) and (as3), and quantized each channel gain g mk to L m fc = L = 4 regions using the low- 
complexity channel quantizer in [13, Sec. IVB]. The power-rate function considered is T-R.([J] mk ) (x) = 
((2 X — l)/g™™([J] m >k ), derived from the outage capacity formula in Example [Q Recall that as discussed 
in Section IV-D[ a properly scaled version of this function is also valid for a maximum instantaneous BER 
requirement [cf. (|25T)1. 

Test Case 1 (Convergence of off-line iterations): A time-division multiple access (TDMA) system was 
simulated with K = 16 uncorrected channels to serve M = 4 users with minimum rate requirements 
f = [4, 8, 12, 16] with an average SNR of 6dB. Upper plots in Figure |2] depict average individual rates 
versus off-line iterations for: (i) the subgradient iteration based on the optimal policies in (fl~5T) with 
(3^ = Ki ' 51 (left top); and (ii) the iterations based on the smooth policies in (flTT) with e = 0.05 and 
j3 = 10~ 2 (right top). The trajectories confirm that while the iterations based on the optimal scheduling 
do not always satisfy the constraints and rate allocation hovers around its optimum, the smooth policy 
converges in a finite number of iterations. Behavior of the trajectories of transmit-powers shown in the 
lower plots of Figure [2] is similar to that for transmit-rates. 

To complement the analysis, we show in Figure[3]the trajectories of the Lagrange multipliers. According 
to the analytical results, convergence occurs for both optimal iterations [cf. (fl~5l) l and smooth iterations 
[cf. (fT7l)l. As explained in Section [TV] the hovering observed in Figure |2] is due to the discontinuities 
of the optimal policy w.r.t. X R . While Figure |3] corroborates that the iterations in (fl~5l) come closer and 
closer to the convergence point in the dual domain (\ R *), Figure [2] illustrates that they fail to guarantee 
the same in the primal domain. On the other hand, the Lipschitz continuity of the smooth scheduling 
policy guarantees convergence in both dual and primal domains. 

Based on both figures, it seems that in this specific case users 2 and 3 would have to share at least one 
channel. However, when they implement the optimum winner-takes-all scheduling, they keep competing 
to be the single winner of the channel. This competition ends only when the exact value of X R * is found, 
but this only can be guaranteed after an infinite number of iterations. 

The numerical tests reveal that the difference between the average power consumed by the smooth 
policy and the one by the optimum policy was 0.01. This amount is considerably smaller than the bound 
e' = Ke = 0.8 given in Proposition HI As explained in footnote 4, such a bound is expected to be loose 
since it is derived for the worst-case scenario. 

Test Case 2 (Convergence of the stochastic schemes): The same set-up of Test Case 1 is used now to gauge 
convergence of the smooth stochastic schemes in (fl~8l . The left plot in Figure @] depicts the trajectories of 
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Fig. 2. Trajectories of average transmit-rates (top) and transmit powers (bottom) for off-line iterations. The iterations based on the optimal 
non-smooth policy are shown in the left while the iterations based on the smooth policy are shown in the right. 

the sample average rate r m [n] := n 

^EJ=iEti [R(J[<?],A%])U [W*(J[g],A%])U vs. the time 
index (online iterations) for every user, while the right plot depicts the corresponding trajectories of the 
sample average of the power p m [n]. The figure illustrates not only that the stochastic schemes are able to 
achieve the same performance as the optimum off-line schemes (dotted line), but also that they converge 
within a few hundreds of iterations. 

To gain more insight about the behavior of the stochastic schemes, Figure |5] depicts the corresponding 
trajectories of the Lagrange multipliers [A fl [n]] m for two different values of stepsize: (3 — 10 ■ 10~ 3 (left 
column) and (3 = 2- 10~ 3 (right column). To facilitate visualization, trajectories of users 4 and 2 are shown 
in a different plot (top) from those of users 3 and 1 (bottom). For comparison purposes, the trajectories 
of the off-line iterations (with i = n) are also plotted using dotted lines. As Proposition [5] stated: (i) the 
trajectories of the online iterations remain locked to the trajectories of the off-line iterations; and, (ii) the 
smaller the step- size, the smaller the gap between online and off-line iterations. 
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Fig. 3. Trajectories of the Lagrange Multipliers for off-line iterations. The iterations based on the optimal non-smooth policy (and decreasing 
stepsize) are shown in the left while the iterations based on the smooth policy (and constant stepsize) are shown in the right. 




Fig. 4. Trajectories of the sample average rate (left) and sample average power (right) for online iterations. Ensemble values achieved by 
the off-line policy are represented as dotted lines. 



Test Case 3 (Performance comparison): An OFDMA system was simulated here with K = 64 subcarriers 
to serve M = 3 users with f = [40, 70, 100] T transmitting over a multi-path fading channel with eight 
taps and exponentially decaying gains. Figure [6] compares the overall average transmit-power for different 
SNR values. Results for five different resource allocation (RA) policies are depicted: (i) the benchmark 
allocation obtained when P-CSI is available (RA1) [19]; (ii) the optimum Q-CSIT based policy with 
the equally probable channel quantizer of [12, Sec. V-B] (RA2); (iii) the smooth policy developed with 
the equally probable channel quantizer of [12, Sec. V-B] (RA3); (iv) this paper's smooth policy with 
a random quantizer (RA4); and (v) a policy based on Q-CSI which optimally adapts R but fixes the 
channel scheduling matrix W, and uses and on/off scheme for the power allocation P. Not only the 
power consumption difference between (RA2) is (RA3) negligible, but their difference w.r.t. the optimum 
P-CSIT in (RA1) is small even for a (sub)-optimum channel quantizer. This is corroborated by the results 
for (RA4) that show that the power penalty for using a random quantizer is around ldB. Finally, it is 
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Fig. 5. Trajectories of estimated Lagrange multipliers [A fl [n]] m for online iterations (solid lines). For comparison purposes, trajectories of 
the off-line iterations are also plotted (dotted lines). 

worth stressing the 6-8dB power savings of (RA3) relative to a heuristic scheme (RA5). 

Further numerical results assessing the performance of RA1, RA3 and RA5 schemes over a wide range 
of parameter values are summarized in Table U These results confirm our previous conclusions, namely: 
(i) the near optimality of R3, and (ii) the performance loss exhibited by the heuristic schemes exemplified 
by R5. Results also show that when a more demanding set-up is simulated, the power savings due to 
the implementation of the optimum schemes are higher. This was expected because for easier scenarios 
(lower rate requirements, smaller number of users), "reasonable" heuristic policies can lead to a good 
solution. 

Test Case 4 (Sensitivity to the number of quantization regions): Table HI] lists the average transmit-power 
versus L k for a set-up with M = 3 users and two different average rate requirements. Consistent with 
orthogonal multiuser access based on Q-CSIT [13], [18], the results in this table demonstrate that they 
lead to a power loss no greater than 2-4 dB w.r.t. the P-CSIT case (L& = oo) if L > 2. (Recall that for 
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Fig. 6. Comparison of various resource allocation schemes on the basis of average transmit-power [dB]. 

TABLE I 

Total average weighted power for RA1, RA3 and RA5 schemes. (Reference case: K = 64, M = 3, f = [40, 70, 100] T , 

SNR=6 dB; OTHER CASES DESCRIBE VARIATION(S) W.R.T. THE REFERENCE CASE.) 



CASE 


RA5 


RA3 


RA1 


Reference Case 


29.9 


21.7 


19.9 


[r] m = 50 


22.6 


18.3 


16.2 


[f]m = 70 


26.8 


21.7 


19.6 


K = 128 


22.2 


18.3 


16.3 


M = 6, f = [40,52,64,76,88, 100] T 


45.6 


31.0 


28.9 


T as in {23} 


27.8 


20.8 


19.9 



the simulated scenario, the lowest region will be inactive; hence, L = 2 implies one active region and 
one zero-rate/zero-power region.) Moreover, the resulting power gap shrinks as the number of regions 
increases reaching a power loss of approximately only 1 dB with L = 8 regions (3 feedback bits per 
channel). 

VII. Concluding Summary 

This paper developed optimal scheduling and resource allocation policies for orthogonal multi-access 
transmissions over fading channels when both terminals and scheduler(s) have to rely only on quantized 
CSI. Focus has been placed on minimization of average power subject to average rate (capacity) constraints, 
but the results presented also when maximizing rate (capacity) subject to average power constraints. 

TABLE II 

Total average weighted power for different values of the number of regions per channel. (RA3 with M = 3, 

K = 64, AND SNR=6dB Vm IS IMPLEMENTED.) 



# of regions per channel 


2 


3 


4 


5 


6 


8 


oo 


Average Power [dB] if f = [50, 50, 50] T 


20.4 


19.0 


18.3 


17.9 


17.6 


17.2 


16.2 


Average Power [dB] if f = [40, 70, 100] T 


24.1 


22.4 


21.7 


21.4 


21.2 


20.9 


19.9 



A 


RA5 






RA4 






RA3 


O 




RA2 




RA1 


— 0— 
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Relative to systems with perfect CSI at the scheduler and channels with continuous fading, the main 
differences of the optimal policies show up in channel scheduling. It was shown that for most channel 
realizations the optimum scheduling amounts to a single (winner) user accessing the channel, while for 
a smaller set of realizations a few users share the resources. Optimal allocation in the sharing case is 
obtained as the solution of a linear program. This disjoint scheduling policy is also present in systems that 
exploit perfect CSI but operate over channels that are deterministic or have discrete fading distribution. 

Having two different policies to schedule users not only incurs higher complexity relative to the winner- 
takes-all case, but also complicates finding the optimum Lagrange multipliers needed to implement the 
optimal policies. To mitigate these challenges, a new scheduling scheme that combines the two different 
schedulers into a single one was developed. It was proved that this single scheme offers reduced complexity, 
facilitates finding the optimal Lagrange multipliers, and exhibits asymptotically optimal performance. 
Moreover, in order to facilitate practical implementation, stochastic schemes that do not need knowledge 
of the channel distribution, keep track of channel non-stationarities, reduce complexity and converge to 
the optimum solution were also developed. The last part of the paper was devoted to analyze the overhead 
associated to the novel schemes and present practical scenarios where the optimal policies derived can be 
implemented^ 

Appendix A: Proof of Convexity of Eq. dB 
If x collects all the optimization variables in (Q~|), the convexity of (OQ) can be ensured if the cost function 

> 0, Vz,j. Since 



and all the constraints satisfy T/. := > 0, Vz, and 'l'J r: := 
all constraints are linear functions, both conditions are satisfied V Xi,Xj, and only the objective cost 
function, C, must be checked. As the entries of R are decoupled in C (the cross-derivatives are zero) 
and the same happens with the entries of W. Hence, it suffices to consider three cases: T?~, , T^L , 

rr _ [R]m,k W]m,k 

and T*L - . The second derivatives (after defining r := [R(J)]. mfc , w : = [W(J)] mfe for notational 

Y^\m,kA** \rn,k 



df 



2 



brevity) are: 



'?( L ))=r( L )- (28) 



dr 2 dr V \w/J \wJ w 

* C 9 <t(L\^ + T (L\\ = t(L\* (29) 



dw 2 dw \ \wJ w \wJ J \wJ w 3 
d 2 C d f • f r \ \ ■• ( r\ —r 



dwdr dw V \w)) \w) w 2 ^ ^ 

Expressions (HHJ-dlOl) yield Tg, n , n = 0, while both Tg, > 0, and 2*L > provided that 

r J [R]m,k,[W]m.,k [R] m ,k — [W]m,k ~ ^ 

T > 0. Hence, the problem in (Q]) is convex if T is a convex function. 

Appendix B: Proof of Proposition [2] 

Using © and the fact that the multipliers must be non-negative, © and © can be manipulated to 
yield 

([C^(J)] m , fc Pr{J} + [X w *(J)] k ) [W*(J)W = 0, Vm (31) 
[c* W *(J)}m,k = ([CwWUk Pr{J} + [A w *(J)] fc ) > 0, Vm (32) 

6 The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official 
policies, either expressed or implied, of the Army Research Laboratory or the U. S. Government. 
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[X w *(J)] k > 0, Vm. (33) 
Slackness KKT condition corresponding to the user-scheduling constraint also implies that 

[A W *(J)U (y2[W*(J)] m>k - l\ = 0, Vfc. (34) 
Based on (HB-dMl), we have that: 

(i) Since m G Ai(J,k) requires the cost to be negative and minimum, we have to prove the validity 
of both. First, suppose [W*(J)] m / jfc > for a user m' whose cost [Cw(J)]m',k is positive. Since 
[X w *(J)] k > 0, both factors ([C w (J)] m ,, k Pr{J} + [\ w *(3)] k ) and [W*(J)] m ^ > in (HD are 
positive, which contradicts the equality required by (I3TT) . Suppose now [W*(J)] m ',k > for a user 
m' such that [C w (J)] m >^ > [c^(J, k)] k . Then, satisfaction of (T3TI) for user m' requires [A H/ *(J)] fc = 
— [CV(J)]m,fc' Pr{J}. Substituting this value into (|32|) to obtain the multiplier for a user m k £ 
M(J, k) yields [oc w *(J)] mktk = [c* w (J, k)] k Pr{J} - [C w (J)]m>,k Pr{J}, which is a negative number 
and hence contradicts the right hand side of d32l) . 

(ii) If \M{J,k)\ > 0, then [C w (J)] m>k < for m e M(J,k). This requires [\ w *(J)) k > in (T32]). 
Substituting the latter into (|34|) . the statement follows. 

fii/J By construction, |.M(J, = if and only if [CV(J)] m)fe > Vm. This implies that if \A4(3, k) \ = 
0, then (1321) will be strictly positive Vm, and thus (T3TI) can be only hold if [W*(J)) m>k > = Vm. 



Appendix C: Proof of Lemma [H 
To prove the first part of the lemma, re-write the Lagrangian in ([Til) using the cost in © as 

(KM \ AI 

E Et C ^ J ' A*)W[W(J) m , fe ] Pr{J} + ]T[A R Uf] m . (35) 

fc=l m=l / m=l 

The dual function can be written as 

(K \ M 

J2[c* w (J,\ R )]k[WV)]m*,k Pr{J} + ^[A R ] m [f] m (36) 
k=l J m=l 

and the smooth version of the dual function as 

(K \ M 

E E [ C ^( J ' ^)W[w s (j)w MJ} + E^WfW ( 3V ) 
fc=l m eA4(J,fc) y m=l 

Based on the definition of M.( J, /c) and Proposition^ it follows that [W*(J)] m * jfc = 5^me.M(J fc)[W s (J)] m ,fc 
VA;. Using this equality, consider the difference 

D S (X R ) - D(X R ) = EE| E 6 C ^ J ' A *)W " t c ^ J ' A ^ fc ) t WS ( J )]^ ) Pr { J >- 08) 

It holds by construction that [Cyi/(J, \ R )} m , k — [c^y(J, A K )]fc > and [Cvi/(J, X R )] mjk — [c^ 
e. Substituting these expressions into d38l) yields, respectively, 

D S (X R ) - D(X R ) > (39) 
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K K 

D\X R ) - D(X R ) <Y,H E ^[W s (J)] m , fc Pr{J} < ]T $>Pr{J} = (40) 

VJGJ fc=l mGA4(J,fc) VJe.7 fe=l 

where in ( |39l) we have used that [W s (J)] m , jfc > and in (1401 we have used that X] m e.M(J fc)[W s (J)] m ,fc < 1. 
Equations (l39l) and (l40l) prove part (i) of Lemma [U 

To establish part (ii), since [d s D{X R )] m can be written as a summation of [R*(J, X R )] mik [W s (J, A R )] mi fc 
terms, we will show that [d s D(X R )] rn is Lipschitz continuous w.r.t. X R by arguing that both W S (J, X R ) 
and R*(J, X R ) are Lipschitz continuous w.r.t. X R . On the one hand, continuity of W S (J, X R ) is ensured 
by Proposition |3]-(iii). Obtaining the Lipschitz constant for this case is trivial, because [W S (J, X R )] mtk 
is differentiable by construction [cf. ([Toll. On the other hand, since [R*(J, X R )] m<k depends only on the 
mth entry of X R [cf. Proposition [3, it suffices to consider how [R*(J)] m fc varies with [A R ] m . Since T 
is strictly convex, it is easy to deduce that T is a continuous monotonic one-to-one function, and so is 
T _1 . While continuity of T _1 implies continuity of [R*(J, X R )] mi k w.r.t. [X R ] m [cf. ©], its monotonicity 
together with the fact that the rate is bounded, gives the Lipschitz property. 

Appendix D: Properties of the Updating Matrices 

This appendix analyzes the behavior of the smooth subgradient in Lemma \T\ The main result is 
summarized in Lemma |2l which is critical for proving convergence of both the off-line iterations in 
Proposition |4] and the online iterations in Proposition [5] 

Define f av and f as M x 1 vector valued functions with entries 

[f (J, X R )} m := [f] m - ]T[R*(J, A*)] m , fc [W s (J, X R )} m , k (41) 

Vfc 

[i av (X R )} m := [f] m - ^^[R*(J, X R )] m , k [W s (J, X R )U k Pr{3} = ^[f (J, X R )} m Pr{J} (42) 

VJ Vfc VJ 

which coincide with the instantaneous and average smooth subgradients d s D s (X R ,n) (Section [IV-AI) and 
d s D(X R ) (Section |IVl), respectively. 

The Jacobian M x M matrices of those functions are [A s (J)] (?jm = d[f(J, X R )] q /d[X R ] m and [A s ] g m = 
^ VJ [A s (J)] (J , n Pr{J}, respectively. Since the entries of f depend on R* and W s , it follows that 

A*(J) := -(A^(J) + A^(J)), where (43) 

[AUJ)U := Et WS ( J ' * R )kkd[R*(J, X R )] qtk /d[X R ] m and (44) 

Vfc 

[Avr(J) W : = Et R *( J ' ^ R )Ud[W s (3, X R )] q , k /d[X R } m . (45) 

Vfc 

Lemma 2: Matrices A S (J) and A s are: (i) negative definite, and (ii) with bounded eigenvalues. 
Proof: Since A s is a weighted sum of A S (J), it suffices to prove (i) and (ii) for A S (J). To simplify 
notation, consider a single channel and drop the subindex k (extension for K > 1 is straightforward). 
To prove (i), we will show first that AJj(J) is positive definite (PD), and then that A^(J) is semi-PD 
(SPD); thus, the sum of both is PD and A S (J) is negative definite. 

Clearly, the derivative of the rate in ([8]) is zero if q ^ m; hence, Afj(J) is diagonal. Using the theorem 
of the inverse function, the diagonal entries are 

|Ai(J)k » = tmb^u ik- Vm - <46) 
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Since T is assumed strictly convex and the rate is bounded, the diagonal elements in (|46l) are finite, 
positive and nonzero; thus, Afj(J) is PD. 

To prove that A^(J) is SPD, define first D R (J) as a M x M diagonal matrix with entries [D fi (J)] m m 
:= [R*(J,A*)] m , and Afc(J) with entries [A^(J)] ? , m := -8[W S (3, X R )} q /d[C w {3, X R )] m . Since 
W s (J,A i? ) can be also written as a function of Cw(3,X R ) [cf. (TT6l)l. A^(J) represents the Jacobian 
matrix of the vector function [[W S (J, A 72 )]!, . . . , [W S (J, \ R )] M ] w.r.t. the vector variable -[[CV(J, A^ji, 
. . . , [Ciy(J, A r )]m]- Based on the previous definitions, A^(J) can be written as 

A S W (J) := D B (J)A^(J)D R (J). (47) 

The multiplication from the left corresponds to the rate product in the definition of A^(J) in (1451) , while 
the multiplication from the right represents the derivative of — Cw(J, X R ) w.r.t. A^ (chain rule). Since 
the product of SPD matrices of the form X x Y x X is SPD if both X and Y are SPD, and D fl (J) is 
PD (diagonal matrix with positive entries), it suffices to show that A^(J) is SPD. 

To find entries of A^(J) four different cases have to be considered: (i) q ^ Ai s (3); (ii) q E Ai s (3) 
and \M S (J)\ = 1; (hi) q E M s (3), \M S (3)\ > 1 and [CV(J,A R )] m > [c* w (3, X R )]; and (iv) q E 
M s (3), \M S (3)\ > 1 and [C w {3, X R )] m = [c* w (3,X R )]. For the two first cases, [W s (J s )] m is constant 
and therefore its derivative is zero. The expressions for the derivatives of (iii) and (iv) are given in 
(|48T) and (l49l) . respectively. Those have been obtained after manipulating (TT6l) and defining n m := 1 — 
([C W (J, X R )] q - [c* w (3, X R )}) je and d := J2 m >eM°(J,k) n m' ( reca11 that n ™ e [0, 1] and n m * = 1). 

[A s c (3)] m ,m = - ^ , m^m* (48a) 

[A^(J)U = --^, m^m* (48b) 



iiv 

IF 



[A^(J)] m *, m * = ^ , m = m* (49a) 



n q + n q X]m'GA4 s (J) n m> - n \ J2m'GM a (3) n m> 

[A&(J)] ftm . = m¥m * d2 ^ , m = m* (49b) 

Matrix A^(J) has several useful properties, namely: (i) it has zero column sum; (ii) it has zero row 
sum; (iii) all diagonal entries are positive; and (iv) for columns m ^ rrf, all non-diagonal entries are 
non-positive. Using (1481) and (|49l and these properties, the following result can be established to prove 
that A^(J) is SPD and thus conclude the proof of Lemma |2r(i)- 
Lemma 3: It holds for A^(J) that: (i) it has one zero eigenvalue; and, (ii) it is SPD. 
Proof: Proving Lemma [3]-(i) only requires considering the products 1 T A^(J) and A^(J)1, where 1 is 
the M x 1 all-ones vector. Since A^(J) has zero-column and zero-row sums, 1 T A^(J) = A^(J)1 = 0. 
This implies that 1 is both a left and a right eigenvector of A^(J) whose associated eigenvalue is 0. 
The proof of (ii) relies on the structure of A^(J). According to (1481) and ( |49l , all rows and columns of 
A^(J) except m* have a regular structure. Consider an M x M matrix U such that [U] mm := 1 Vm, 
[U]m*,m := 1 Vm; and [U] m , m / := 0, otherwise. It is clear that U has rank M and the range of U T is M M . 
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Consider now the matrix V(J) := U x A^(J) x U T . Due to the structure of U and A S C (J), it follows 
that [V(J)] m , m / = if either m = m* or ml = m*, while [V(J)] m , jm / = [A^(J)] m)m /. In words, V(J) 
is a copy of A^(J) were both the m*th column and the m*th row have been set to zero. Suppose now 
that V(J) is SPD, meaning that x T V(J)x > OVxG R M or equivalently x T U x A S C (J) x U T x > 0. 
Setting x = U T x, we can conclude that x T A^(J)x > 0, and therefore A^(J) is SPD. The next lemma 
establishes that V(J) is in fact SPD and hence A^(J) is SPD, as asserted by Lemma Hl-(ii). 
Lemma 4: It holds for V(J) that: (i) it has one zero eigenvalue; and, (ii) it is SPD. 
Proof: Without loss of generality, assume that m* = M and define Q(J) as the (M — 1) x (M — 1) 
matrix whose mth column is formed by the M — 1 first entries of the mth column of V(J); i.e., the 
all-zero column and all-zero row corresponding to the optimum user have been dropped. It is clear that 
the eigenvalues of V(J) are all the eigenvalues of Q(J) plus a zero eigenvalue. Hence, in order to prove 
Lemma |H it suffices to show that Q(J) is PD. 

To prove that Q( J) is PD, let D( J) N denote an (M — 1) x (M — 1) diagonal matrix with positive entries 
[D(J)jv] m ,m — n m and recall that Im-i and Im-i.m-i denote the identity and all-ones (M — 1) x (M — 1) 
matrices, respectively. Using this notation, (1481 can be written in matrix form as 

Q(J) = ^D{i) N [l M -i + A W (J)] (50) 

where 

Aat(J) = Tr(D JV (J)D JV (J))I M _ 1 - D^(J)1 M „ M/ „ 1 D^(J). (51) 

Matrix Aat(J) is SPD because all its eigenvalues are nonnegative. In fact, it is easy to see that the 
eigenvalues of Atv(J) are and Tr(D A r(J)D A r(J)), the latter one with multiplicity M — 2. This property 
implies that the factor I M -i + Aat(J) in (El is PD. Since 2/ed 2 > and T>(J) N in (El is a l so PD 
(diagonal with positive entries), it follows that Q(J) is PD, concluding the proof of Lemma |U 

Summarizing, we have proved that A S (J) is PD because it can be written as A S (J) = A^(J) + Af^(J), 
where Afj(J) is a PD and Afy(J) is SPD. Matrix A|j(J) is PD because it is diagonal with positive entries 
[cf. ([46])]. On the other hand, A^(J) is SPD because it can be written as D R (J) A^(J)D fl (J), where 
D_r(J) is PD (diagonal with positive entries) and A^(J) is SPD [cf. Lemmas [3] and |4j . 

To show Lemma [2l-(ii) we only have to show that the eigenvalues of A S (J) are bounded. This follows 
from the fact that the entries of both Afj(J) and A^(J) are bounded. Specifically, the strict convexity 
of T guarantees that the non-zero entries of Afj(J) are finite [cf. the denominator in (|46l)l. In addition, 
the absolute value of the entries of A^(J) in (I48al) . (I48bl) . (I49al) . and (I49bl) can be safely upper bounded 
by l/e, l/e, 2(M - l)/e, and (M - l)/e, respectively. 

Appendix E: Proof of Proposition ffl-(n) 

Since Proposition |4]-(ii) provides upper and lower bounds for D s (\ Rs ), we will prove each separately. 
Recall that \ Rs denotes the limit of the e'-subgradient iteration and X R * the optimal solution of (fl4l) . 

To prove the upper bound, we rely on Lemma [TJ-(i) which ensures that D S (X R ) < D(X R ) + e' \/\ R . 
Substituting X R = X Rs into the last inequality yields 

D s (X Rs ) < D(X Rs ) (52) 
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Moreover, since X R * is the value maximizing D(X R ), it holds that D(X Rs ) < D(X R *). Substituting this 
condition into (152b one can readily obtain 

D s (X Rs ) < D(X R *)+e' (53) 

which is the upper bound given in Proposition H]-(ii). 

To establish the lower bound, define first the average weighted power consumption as 

M K 

P(R(J),W(J)) := ^^[M] m ^T 7 , {[J]m , fe) ([R(J)] m , fc )[W(J)] m , fc Pr{J}. (54) 

VJ 771=1 fc = l 

Since the problem in (OQ) has zero duality gap, the optimum primal and dual values coincide; hence 

P* = P(R*(J, X R *), W*(J, X R *)) = D(X R *). (55) 
On the other hand, it holds that 

P(R*(J, X Rs ), W S (J, X Rs )) = D s (X Rs ). (56) 

This is because the iterations in Proposition H]-(i) only converge when d s D(X Rs ) = 0; the smooth 
subgradient being zero requires all the average rate constraints to be satisfied with equality; and the 
latter implies that the only remaining term in the Lagrangian is P(R*(J, X Rs ), W S (J, X Rs )); cf. (154b . 
CPU) , and the definition of D s (X Rs ) in Lemma [TJ Finally, since R*(J, X Rs ) and W S (J, X Rs ) are feasible 
primal variables, it holds that P* < P(R*(J, X Rs ), W S (J, X Rs )). Using ([55]) and (156b. the latter inequality 
yields D(X R *) < D s (X Rs ), which corresponds to the lower bound given in Proposition H]-(ii). 

At this point, it is worth clarifying a potentially misleading implication of Proposition HI Once the exact 
value of X Rs is found after using iterations in < fT7b , one can use Lemma [U-(i) to show that D(X Rs ) < 
D s (X Rs ). This implies that the power cost of the Lagrangian in © with primal variables R*(J, X Rs ) and 
W*(J, X Rs ) used as final solution will be lower than that with the smooth R*(J, X R ) and W S (J, A^). 
Nevertheless, R*(J, X Rs ) and W*(J, X Rs ) cannot be used as a better approximation to the optimal solution 
R*(J, X R *) and W*(J, X R *) because R*(J, X Rs ) and W*(J, X Rs ) may (and most likely will) fail to satisfy 
the average rate constraints in ([I]), leading to infeasibility from a primal point of view. On the other hand, 
the primal variables R*(J, X R ) and W S (J, X R ) give rise to a slightly higher dual objective (thus higher 
power cost in the Lagrangian), but they are guaranteed to be feasible and tightly satisfy the average rate 
constraints. 

Appendix F: Proof of Proposition [6] 

Using d8b and © we can write [C w \ m , k :=T^ m ifc (j)([R*] m ,fc) -t^([j] mifc )([R*] m ,fc) [R*]m,fc- On the one 
hand, the convexity of T guarantees: d[C w ] mjk /d[R] m>k = -t^ ([ j ]m fe) ([R*] m>fc ) [R*] m ,k< 0; on the other 
hand, it is assumed that \R(j m>k + l)] m fc> \R{j m ,k)]m,k- The combination of these two conditions implies 
that [Cw(jm,k + l)]m,fc < [Cw(jm,k)]m,k, which proves (i). Based on this monotonicity property, we prove 
next (ii) and (Hi). 

If a vector j' belongs to the set in (ii), then [CV([jV)k',fc > [Cw([j] m ')]m>,k > [Cw([i]m)]m,* = 
[Cv^([j']m)]m,fe Vm', and therefore (ii) follows. Observe that the first inequality is due the condition 
[j']m' < [j]m Vm' in (ii) and the decreasing behavior of [Cw(jm,k + l)]m,fc- The second holds because 
m E M(i,k) but m' ^ M.(j, k), and the third is due to the condition [j'] m = [j] m in (ii). 
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If a vector j' belongs to the set in (Hi), since [j'] m ' > [j]m'» tnen [Cw([j']m')]m',t < [Cw([j] m >)] m / ifc (better 
the channel, lower the cost), and therefore min{[Cw(j')]fc} < min{[CV(j)]fc}. Furthermore, since j ^ 
it holds that min{[CV(j)]fc} < [Cvi/([j]m)]m,fc- On the other hand, using that [j'] m = [j] m , it follows 
that [CV([j]m)]m,fc = [Cw/([j']m)]m,fc- Based on these observations it is inferred that min{[CV(j')]fc} < 
[C w ([j'} m )}m,k, which proves (Hi). 



Appendix G: Proof of Convexity of Eqs. (|23T> , (|25T) and ([261 

To show the convexity of (|23l ), recall that if x = f~ l (y) is the inverse function of y = f(x), then 
f~~ l (y) = 1/ (f[f~ l (y)}). Using the chain rule of differentiation it follows that f~ 1 {y) = — 
(/[/"Hz/)]) • Substituting / = T _1 and f^ 1 = T into the last equality yields 

f {x) = : ^ m ^ (57) 



T-![T(x) 

By the definition of T _1 in (1221) . it can be readily checked that T" 1 > and T" 1 < 0. These inequalities 
imply that (l57l) is positive, and hence T is strictly convex. 

The convexity of (|25T) is straightforward by readily confirming positivity of 

f 2'ln(4)ln(« 1 /e max ) 

T ^([J] m . fc ) (a;) = — — • (58) 

«29m,fc,[J] m , fc -l 



Finally, to show the convexity of (1261) . define first 

f e (x,y):= — / e B ^ k dg m>k - / e ^-i (59) 

Kl ^3m,fe,[Jl m -l ^9m,fe,[j] m -l 

and re-write T^ ([J]m fc) as 

T ^([j] m , fc ) = lx -> y : AO, y) = o| , (60) 

where y is uniquely determined by the equation f e (x,y) = 0. Since df e = ^fdx + f^-ffcfo; = 0, and 

dy _ -dfe/dx 
dx df € /dy 



, substituting from (1591 ) yields 



Sm,k ( 1 

<9y -dfjdx Jq m> k, Wm -i (2--1) 2 w e y2 x ln(2) 



&r dfjdy rq ... Ko _£^(- 1+ %^) 

J 9m,fc, Jm -l z 1 



(61) 



and for the second derivative 



cPy = dy_J7_ -2*ln(2) = y2 a ln(2) 
<9x 2 <9x2*-l y (2 X -l) 2 2*-l " 

Since x and y (rate and power) are positive, it follows readily that d 2 y/dx 2 > 0. 
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